Skip to content

Trainer + Multi image v0.1.0#41

Merged
Blaizzy merged 79 commits intomainfrom
pc/tuner
Oct 11, 2024
Merged

Trainer + Multi image v0.1.0#41
Blaizzy merged 79 commits intomainfrom
pc/tuner

Conversation

@Blaizzy
Copy link
Owner

@Blaizzy Blaizzy commented Jun 15, 2024

This PR adds:

  • LoRA and QLoRA fine-tuning
  • Multi-image support
  • Batch processing
  • image resizing

New Models

  • Pixtral
  • Qwen2-VL
  • Llava-Interleave

Closes #73 #69

@lin72h
Copy link

lin72h commented Jun 15, 2024

It's happening!

@Blaizzy
Copy link
Owner Author

Blaizzy commented Jun 15, 2024

Absolutely! 🚀

@yukiarimo
Copy link

Any updates? Is it usable?

@Blaizzy
Copy link
Owner Author

Blaizzy commented Sep 20, 2024

Hey @yukiarimo

It's almost done.

I just want to run some extra tests (QLoRA, full finetune) and finish Qwen2-VL to release it.

@Blaizzy Blaizzy marked this pull request as ready for review September 28, 2024 15:20
@Blaizzy Blaizzy changed the title Trainer Trainer + Multi image v0.1.0 Oct 5, 2024
@Blaizzy Blaizzy merged commit ae66c0b into main Oct 11, 2024
@Blaizzy Blaizzy deleted the pc/tuner branch October 25, 2024 19:46
Garry-TI pushed a commit to Garry-TI/mlx-vlm that referenced this pull request Sep 23, 2025
* remove torch and mlx-lm

* remove torch and mlx-lm

* add peft model creation

* use tree flatten

* add dataset loader

* fix dataset

* fix masks and rename dataset

* support batch processing and train on completions

* fix trainer

* formatting

* add support for none splits and fix assistant id

* Add lora script and docs

* remove torch and mlx-lm

* add peft model creation

* use tree flatten

* add dataset loader

* fix dataset

* fix masks and rename dataset

* support batch processing and train on completions

* fix trainer

* formatting

* add support for none splits and fix assistant id

* Add lora script and docs

* remove duplicates

* fix batch load

* load trained adapters and add super to all models

* fix pixtral quant

* speed up qwen batch processing

* fix qlora training

* fix dataloader

* formatting

* fix pixtral pixel loading

* fix lora and dataset

* add batch processing suppor for qwen2_vl

* update lora docs

* add unit tests

* set stage for phi3_v support

* update logs and readme

* add utils tests and remove unused collate fn

* refactor prompt utils and add multi-image support for pixtral

* add llava interleave support

* multi image support

* add image resizing

* refactor data loading

* update data procesing and tqdm

* add llava interleave

* formatting

* add list of models with multi-image support

* remove trimmed labels

* remove warning

* pin reqs

* add config dict condition

* fix pixtral FT prompt

* formatting images

* remove unused

* update trainer init

* update lora

* update md and formatting

* bump version

* add tests for pixtral and qwen2_vl

* add tests for pixtral

* Merge branch 'pc/tuner' of https://github.com/Blaizzy/mlx-vlm into pc/tuner

* fix test

* remove rope scaling

* remove test args and update MD

* format dataset defaults

* add dataset formatting info

* Fix issues with multiple image handling (Blaizzy#78)

1. [IMG_BREAK] and [IMG_END] are lost after embedding
 2. image position encode should be done per image base
    https://github.com/mistralai/mistral-inference/blob/main/src/mistral_inference/vision_encoder.py#L85
    https://github.com/huggingface/transformers/blob/main/src/transformers/models/pixtral/modeling_pixtral.py#L492

Co-authored-by: Roger Xu <rogerxu@gmail.com>

* fix styling

* update model

* update default model

* rewrite comments

---------

Co-authored-by: hiima234 <98786318+hiima234@users.noreply.github.com>
Co-authored-by: Roger Xu <rogerxu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error running mlx-community/pixtral-12b-4bit

5 participants